In this paper, we propose a noise robust bottleneck feature representationwhich is generated by an adversarial network (AN). The AN includes two cascadeconnected networks, an encoding network (EN) and a discriminative network (DN).Mel-frequency cepstral coefficients (MFCCs) of clean and noisy speech are usedas input to the EN and the output of the EN is used as the noise robustfeature. The EN and DN are trained in turn, namely, when training the DN, noisetypes are selected as the training labels and when training the EN, all labelsare set as the same, i.e., the clean speech label, which aims to make the ANfeatures invariant to noise and thus achieve noise robustness. We evaluate theperformance of the proposed feature on a Gaussian Mixture Model-UniversalBackground Model based speaker verification system, and make comparison to MFCCfeatures of speech enhanced by short-time spectral amplitude minimum meansquare error (STSA-MMSE) and deep neural network-based speech enhancement(DNN-SE) methods. Experimental results on the RSR2015 database show that theproposed AN bottleneck feature (AN-BN) dramatically outperforms the STSA-MMSEand DNN-SE based MFCCs for different noise types and signal-to-noise ratios.Furthermore, the AN-BN feature is able to improve the speaker verificationperformance under the clean condition.
展开▼